AITopics | dc analysis

Collaborating Authors

dc analysis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis

Suetake, Yamato, Kawakami, Yuta, Ikeda, Shunnosuke, Takano, Yuichi

arXiv.org Machine LearningMay-27-2026

Collaborative analysis of decentralized confidential datasets is important, but direct sharing of original datasets is often restricted by privacy and institutional constraints. Data collaboration (DC) analysis transforms each dataset into privacy-preserving intermediate representations via party-specific obfuscation functions and integrates them into common collaboration representations using an anchor dataset. However, many existing DC analysis methods rely on linear transformations for data obfuscation and integration, which may increase reconstruction risk. Although nonlinear dimensionality reduction can mitigate this risk, conventional linear integration methods cannot accurately align intermediate representations produced by nonlinear transformations. Moreover, existing integration methods mainly minimize discrepancies among parties and do not explicitly incorporate geometric or target-variable information useful for downstream analysis. To overcome these limitations, we first formulate linear kernel integration (LKI) as a linear integration method and then kernelize it to obtain nonlinear kernel integration (NKI). NKI admits a globally optimal solution via kernel ridge regression and an eigenvalue problem. We also introduce graph regularization and a centering constraint so that the target representation can capture geometric and target-variable information useful for downstream analysis. Experiments on image classification tasks demonstrate that NKI improves classification accuracy over existing linear integration methods under nonlinear dimensionality reduction, with further gains from target-variable-aware graph regularization and centering. The results also show that dimensionality reduction choices substantially affect both classification accuracy and reconstruction risk.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2605.27219

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)

Add feedback

Anomaly Detection in Double-entry Bookkeeping Data by Federated Learning System with Non-model Sharing Approach

Mashiko, Sota, Kawamata, Yuji, Nakayama, Tomoru, Sakurai, Tetsuya, Okada, Yukihiko

arXiv.org Artificial IntelligenceJan-22-2025

Anomaly detection is crucial in financial auditing and effective detection often requires obtaining large volumes of data from multiple organizations. However, confidentiality concerns hinder data sharing among audit firms. Although the federated learning (FL)-based approach, FedAvg, has been proposed to address this challenge, its use of mutiple communication rounds increases its overhead, limiting its practicality. In this study, we propose a novel framework employing Data Collaboration (DC) analysis -- a non-model share-type FL method -- to streamline model training into a single communication round. Our method first encodes journal entry data via dimensionality reduction to obtain secure intermediate representations, then transforms them into collaboration representations for building an autoencoder that detects anomalies. We evaluate our approach on a synthetic dataset and real journal entry data from multiple organizations. The results show that our method not only outperforms single-organization baselines but also exceeds FedAvg in non-i.i.d. experiments on real journal entry data that closely mirror real-world conditions. By preserving data confidentiality and reducing iterative communication, this study addresses a key auditing challenge -- ensuring data confidentiality while integrating knowledge from multiple audit firms. Our findings represent a significant advance in artificial intelligence-driven auditing and underscore the potential of FL methods in high-security domains.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.12723

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
North America > United States > Hawaii (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Audit (0.66)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Imakura, Akira, Sakurai, Tetsuya, Okada, Yukihiko, Fujii, Tomoya, Sakamoto, Teppei, Abe, Hiroyuki

arXiv.org Artificial IntelligenceAug-30-2022

Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has considerable research attention. For the datasets of multiple medical institutions, data confidentiality and cross-institutional communication are critical. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative cross-institutional communications may be appropriate. Identifiability of the shared data is essential when analyzing data including personal information. In this study, the identifiability of the DC analysis is investigated. The results reveals that the shared intermediate representations are readily identifiable to the original data for supervised learning. This study then proposes a non-readily identifiable DC analysis only sharing non-readily identifiable data for multiple medical datasets including personal information. The proposed method solves identifiability concerns based on a random sample permutation, the concept of interpretable DC analysis, and usage of functions that cannot be reconstructed. In numerical experiments on medical datasets, the proposed method exhibits a non-readily identifiability while maintaining a high recognition performance of the conventional DC analysis. For a hospital dataset, the proposed method exhibits a nine percentage point improvement regarding the recognition performance over the local analysis that uses only local dataset.

dc analysis, intermediate representation, representation, (14 more...)

arXiv.org Artificial Intelligence

2208.14611

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
North America > United States > California (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.54)
(4 more...)

Add feedback

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Imakura, Akira, Kihira, Masateru, Okada, Yukihiko, Sakurai, Tetsuya

arXiv.org Artificial IntelligenceAug-26-2022

Recently, data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions. DC analysis centralizes individually constructed dimensionality-reduced intermediate representations and realizes integrated analysis via collaboration representations without sharing the original data. To construct the collaboration representations, each institution generates and shares a shareable anchor dataset and centralizes its intermediate representation. Although, random anchor dataset functions well for DC analysis in general, using an anchor dataset whose distribution is close to that of the raw dataset is expected to improve the recognition performance, particularly for the interpretable DC analysis. Based on an extension of the synthetic minority over-sampling technique (SMOTE), this study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage. Numerical results demonstrate the efficiency of the proposed SMOTE-based method over the existing anchor data constructions for artificial and real-world datasets. Specifically, the proposed method achieves 9 percentage point and 38 percentage point performance improvements regarding accuracy and essential feature selection, respectively, over existing methods for an income dataset. The proposed method provides another use of SMOTE not for imbalanced data classifications but for a key technology of privacy-preserving integrated analysis.

dataset, recognition performance, smote, (13 more...)

arXiv.org Artificial Intelligence

2208.12458

Country: Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback